[slop]fix(workflow-engine): only commit step state after success#5010
Conversation
|
🚅 Deployed to the rivet-pr-5010 environment in rivet-frontend
|
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Code Review: fix(workflow-engine): only commit step state after success
OverviewThis PR fixes a correctness bug in workflow-engine where actor state mutations inside a failing step were being persisted even though the step needs to be retried. On retry, the step would execute against incorrectly mutated state. The fix:
The core idea is correct and addresses a real workflow replay bug. Bug: Breaks stateless actors using workflow steps
This means any actor that uses workflow steps but does not have a Concern:
|
| Area | Finding |
|---|---|
| Core logic | Correct - rollback on step failure fixes a real replay bug |
| Bug | Stateless actors with workflow steps crash on RAW_STATE_SYMBOL() before step runs |
| Concern | structuredClone(vars) throws if vars contains non-cloneable objects |
| Minor | Notification fires before flush for exhausted/timeout errors (slightly wider crash window) |
| Test name | 'should not commit step error data' contradicts what the test actually asserts |
| PR hygiene | Description, type checkboxes, and test description are all empty; PR is still in draft |
The core approach is sound. Please address the stateless actor crash (and consider the vars clone concern) before merging.
56af1d1 to
aac9634
Compare
cc1baa6 to
9304f04
Compare
aac9634 to
5f7bac7
Compare
9304f04 to
0668fa7
Compare
0668fa7 to
2c029c2
Compare
2c029c2 to
9b64fa5
Compare
9b64fa5 to
0eb8f51
Compare
67b4f92 to
e402a50
Compare
8def8de to
e67249c
Compare
e402a50 to
7b5d681
Compare
e67249c to
31975cf
Compare
7b5d681 to
8625f84
Compare
31975cf to
26b0baa
Compare
|
[Updated review - see below for latest] |
PR Review: fix(workflow-engine): only commit step state after success
OverviewTwo related correctness fixes in the workflow engine:
Positives
IssuesMisleading test name (clear bug) The new test is titled // The error should be committed for inspection.
expect(entry.kind.data.error).toBe("Error: step failed");A better name: Crash safety regression from Before this PR, each error branch flushed immediately, persisting The crash window is:
This window was effectively zero pre-PR. A comment on the error catch block documenting that the caller (outer loop) must flush before sleeping/terminating would make the invariant explicit. Migration risk from The old fallback handled partial-write scenarios where the history entry was written but
Test Coverage
SummaryThe
|
8625f84 to
46973f2
Compare
26b0baa to
1344263
Compare
1344263 to
ec51340
Compare
46973f2 to
c5ef70f
Compare
c5ef70f to
8c70217
Compare
ec51340 to
42fadeb
Compare
42fadeb to
2258fbe
Compare
8c70217 to
00af8a0
Compare
ad1ebe6 to
6d4d578
Compare
2258fbe to
aa538e5
Compare

Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: